Intensive Use of Lexicon and Corpus for WSD
نویسندگان
چکیده
The paper addresses the issue of how to use linguistic information in Word Sense Disambiguation (WSD). We introduce a knowledge-driven and unsupervised WSD method that requires only a large corpus previously tagged with POS and very little grammatical knowledge. The WSD process is performed taking into account the syntactic patterns in which the ambiguous occurrence appears, relaying in the hypothesis of “almost one sense per syntactic pattern”. This integration allows us to obtain, from corpora, paradigmatic and syntagmatic information related to the ambiguous occurrence. We also use variants of EWN information for word senses and different WSD algorithms. We report the results obtained when applying the method on the Spanish lexical sample task in Senseval-2. This methodology is easily transportable to other languages.
منابع مشابه
Combining Machine Readable Lexical Resources and Bilingual Corpora for Broad Word Sense Disambiguation
This paper describes a new approach to word sense disambiguation (WSD) based on automatically acquired "word sense division. The semantically related sense entries in a bilingual dictionary are arranged in clusters using a heuristic labeling algorithm to provide a more complete and appropriate sense division for WSD. Multiple translations of senses serve as outside information for automatic tag...
متن کاملValue for Money: Balancing Annotation Effort, Lexicon Building and Accuracy for Multilingual WSD
Sense annotation and lexicon building are costly affairs demanding prudent investment of resources. Recent work on multilingual WSD has shown that it is possible to leverage the annotation work done for WSD of one language (SL) for another (TL), by projecting Wordnet and sense marked corpus parameters of SL to TL. However, this work does not take into account the cost of manually cross-linking ...
متن کاملUsing Parallel Texts and Lexicons for Verbal Word Sense Disambiguation
We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method (Dušek et al., 2014), which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as...
متن کاملCombining EWN and Sense-Untagged Corpus for WSD
In this paper we propose a mixed method for Word Sense Disambiguation, which combines lexical knowledge from EuroWordNet with corpora. The method tries to give a partial solution to the problem of the gap between lexicon and corpus by means of the approximation of the corpus to the lexicon. On the basis of the interaction that holds in natural language between the syntagmatic and the paradigmat...
متن کاملClass Based Sense Definition Model for Word Sense Tagging and Disambiguation
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense amb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 33 شماره
صفحات -
تاریخ انتشار 2004